A Study of Phone Recognizer Combination for Higher Accuracy in Timit Phone Recognition
نویسندگان
چکیده
Generally, phone recognition system contains only a single phone recognizer. The phone set and speech representation for a recognizer are optimized for a particular task. This paper studies the effect of phone sets and speech representations for TIMIT phone recognition task. Two phone sets (TIMIT original phone set and 39 classical phone set) and two speech representations (MFCC-based and PLP-based) are tested. The phone recognizers for each phone set and speech representation are experimented and analyzed on both TIMIT training and testing sets. The results show that the 39 classical phone set with PLP speech representation phone recognizer yields highest phone accuracy. However, this best phone set works well only on a particular phone subset while the other recognizers work better on the other subsets. Therefore, the combination of several phone recognizers indicates higher phone accuracy than any single phone recognizer.
منابع مشابه
High performance speaker-independent phone recognition using CDHMM
In this paper we report high phone accuracies on three corpora: WSJ0, BREF and TIMIT. The main characteristics of the phone recognizerare: high dimensional feature vector (48), contextand genderdependent phone models with duration distribution, continuous density HMM with Gaussian mixtures, and n-gram probabilities for the phonotatic constraints. These models are trained on speech data that hav...
متن کاملApplications of virtual-evidence based speech recognizer training
We present two applications of our previously proposed virtualevidence (VE) based speech recognizer training algorithm [1, 2]. The first relates to two-pass training where segmentations obtained during the first pass are used as VE to train the subsequent pass. We use the TIMIT phone and SVitchboard continuous speech recognition tasks to demonstrate the benefits of using VE based training in tw...
متن کاملSpeech Recognition Using a Discriminative , Context - Independent , Segment - Based SpeechRecognizerJan
| In this paper, we describe important improvements that were recently introduced in our Discriminative Stochastic Segment Model (DSSM) speech recognizer. We propose a new presegmen-tation algorithm and we optimize the structure of the Multi-Layer Perceptron (MLP) that estimates the phone probabilities. Additionally, we describe a cascade MLP combination technique that relaxes the drawbacks of ...
متن کاملDirected graphical models of classifier combination: application to phone recognition
Classifier combination is a technique that often provides appreciable accuracy gains. In this paper, we argue that the underlying statistical model of classifier combination should be made explicit. Using directed graphical models (DGMs), we provide representations of two common combination schemes, the mean and product rules. We also introduce new DGMs that yield novel combination rules. We fi...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006